Fix issues with chat template application on responses requests #4173
Conversation
There was a problem hiding this comment.
Pull request overview
This PR targets improved OpenAI Responses API request handling in OVMS LLM serving so the existing Python/Jinja chat-template path can reliably consume Responses-format inputs (including tool-calling and reasoning-related items).
Changes:
- Add debug logging before applying the Python/Jinja chat template.
- Extend Responses
inputparsing to accept additional item/content shapes (reasoning summaries, tool-call items, missing/empty content,output_text). - Build a
processedJsonpayload in chat/completions-style (messages+ convertedtools) for the Python/Jinja template path.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| src/llm/py_jinja_template_processor.cpp | Adds a debug log of the incoming request body before applying the chat template. |
| src/llm/apis/openai_responses.cpp | Enhances Responses API parsing and constructs chat/completions-compatible processedJson including tool conversion and tool-call merging. |
035101b to
3ac1cf1
Compare
773cb1a to
0aa9a48
Compare
| return false; | ||
| } | ||
|
|
||
| SPDLOG_DEBUG("Before chat template: \n {}", requestBody); |
There was a problem hiding this comment.
do we want to keep it?
| } catch (const std::exception& e) { | ||
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Failed to apply chat template: {}", e.what()); | ||
| return absl::Status(absl::StatusCode::kInvalidArgument, "Failed to apply chat template. The model either does not have chat template or has an invalid one."); | ||
| return absl::Status(absl::StatusCode::kInvalidArgument, absl::StrCat("Failed to apply chat template: ", e.what())); |
There was a problem hiding this comment.
do we want to expose call stack to the user?
0aa9a48 to
2153f52
Compare
2153f52 to
99b08c3
Compare
| mkdir -p ${HOME}/models | ||
| docker run -d --user $(id -u):$(id -g) --rm -p 8000:8000 -v ${HOME}/models:/models --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) openvino/model_server:weekly \ | ||
| --rest_port 8000 --model_repository_path /models --source_model Junrui2021/Qwen3-VL-8B-Instruct-int4 --tool_parser hermes3 --target_device GPU --task text_generation --pipeline_type VLM_CB --allowed_media_domains raw.githubusercontent.com | ||
| --rest_port 8122 --model_repository_path /models --source_model Junrui2021/Qwen3-VL-8B-Instruct-int4 --model_name ovms-model --tool_parser hermes3 --target_device GPU --task text_generation --pipeline_type VLM_CB --allowed_media_domains raw.githubusercontent.com |
There was a problem hiding this comment.
I think this change is unintended
ed876b5 to
6a0baef
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
src/test/http_openai_handler_test.cpp:3902
- Same issue as parseResponses(): EXPECT_FALSE on parse errors is non-fatal, so the helper can proceed to construct/parse a handler against an invalid document and return a status that may not reflect the original failure clearly. Consider using ASSERT_FALSE for JSON parsing failures (or returning an explicit InvalidArgument status when doc.HasParseError() is true).
doc.Parse(json.c_str());
EXPECT_FALSE(doc.HasParseError()) << json;
std::optional<uint32_t> maxTokensLimit;
uint32_t bestOfLimit = 0;
std::optional<uint32_t> maxModelLength;
auto apiHandler = std::make_shared<ovms::OpenAIResponsesHandler>(
doc, ovms::Endpoint::RESPONSES, std::chrono::system_clock::now(), tokenizer);
return apiHandler->parseRequest(maxTokensLimit, bestOfLimit, maxModelLength);
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (2)
src/test/http_openai_handler_test.cpp:3915
- parseResponses() returns nullptr after ADD_FAILURE() when parsing fails; downstream helpers/tests (e.g., expectResponsesEquivalentToChatCompletions) dereference the returned pointer without asserting non-null, which can crash the test and obscure the real failure. Add an ASSERT_NE(apiHandler, nullptr) at the start of consumers or refactor the helper to fail fatally.
doc.Parse(json.c_str());
if (doc.HasParseError()) {
ADD_FAILURE() << "Failed to parse JSON: " << json;
return nullptr;
}
src/llm/apis/openai_responses.cpp:550
- ProcessedJsonSink::emitStandaloneReasoning() omits the "content" field entirely. Templates that access message.content unconditionally will fail on such messages. Emit "content": "" (empty string) for standalone reasoning turns to keep processedJson compatible with a wider set of templates.
void emitStandaloneReasoning(const std::string& reasoning) {
rapidjson::Value msgObj(rapidjson::kObjectType);
msgObj.AddMember("role", rapidjson::Value("assistant", alloc), alloc);
msgObj.AddMember("reasoning_content", rapidjson::Value(reasoning.c_str(), alloc), alloc);
messagesArray.PushBack(msgObj, alloc);
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
src/llm/apis/openai_responses.cpp:660
- parseInput() logs “Parsed responses input … without mutating request JSON”, but parseResponsesPart() now mutates
docbefore calling parseInput (tools normalization and potentially adding chat_template_kwargs). This debug message is misleading; consider rewording it or moving the tools normalization to after parseInput so the statement remains true.
} else {
return absl::InvalidArgumentError("input is not a string or array");
}
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Parsed responses input directly to chat history without mutating request JSON");
return absl::OkStatus();
src/test/http_openai_handler_test.cpp:708
- This comment says “For Responses, processedJson is always built from chatHistory”, but the updated Responses implementation now builds processedJson directly from the original
input(including reasoning/function_call handling) rather than re-serializing chatHistory. Please update the comment to reflect the current contract to avoid confusion when maintaining these tests.
ASSERT_NE(apiHandler, nullptr);
// For Responses, processedJson is always built from chatHistory.
// For chat/completions with simple text, processedJson is empty (original body is used instead).
// In both cases, the chatHistory should be equivalent.
🛠 Summary
JIRA/Issue if applicable.
Describe the changes.
🧪 Checklist
``